with real-time word and document updates, whereas SVD re-computation would 

generally not be feasible. - 
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APPENDIX A: 






VERSION WITH MARKINGS TO SHOW CHANGES MADE TO SPECIFICATION 
The paragraph beginning on page 6 at line 10: 

FIGS. 3 A and 3B [is] ^ an overview of selected components of the basic LS A 
paradigm, in accordance with one embodiment of the present invention; 

The paragraph beginning on page 6 at line 12: 

FIGS. 4 A and 4B [is] ^ an overview of selected components of the adaptive 
LSA paradigm, in accordance with one embodiment of the present inventions; 
The paragraph beginning on page 6 at line 14: 

FIGS. 5 A and 5B [is] are an overview of selected components of the matrix 
transformation of the adaptive LSA paradigm, in accordance with one embodiment of the 
present invention; 

The paragraph beginning on page 6 at line 16: 

FIGS. 6A and 68 [is] are an overview of selected components of the vector 
transformation of the adaptive LSA paradigm, in accordance with one embodiment of the 
present invention; 

The paragraph beginning on page 6 at line 18: 

FIGS. 7 A and 73 [is] m;e an overview of selected components of prior art baseline 
adaptation; 

The paragraph beginning on page 1 1 at line 18: 

FIGS. 3 A and 3B illustrate[s] selected components of the basic LSA paradigm 
300 used to construct the continuous vector space S, referenced in FIG. 3B as LSA space 
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5 316. The LSA paradigm 300 first captures the semantic patterns of the word-document 
co-occurrences that appeared in the training corpus 7 202 by constructing a word- 
document matrix W 302 of dimension Mx N , whose entries 304 suitably reflect the 
extent to which word w. 208 appeared in document dj 204, and then performing a 
singular value decomposition (SVD) of the word-document matrix W 302 having an 
order of decomposition of R « min{M,N)diS in [1]: 

W = USV\ (1) 
where (7 306 is the MxR left singular matrix of row vectors, w,(l < / < M), S 308 is 

the RxR diagonal matrix of singular values > ^2 > ...5^ > 0 , and is the 
transposition of F 3 1 0, the RxN right singular matrix of row vectors Vj (l < j < N). 

The value of R can vary depending on the values of M and , and by balancing 
computational speed (associated with lower values of R) against accuracy (associated 
with higher values of R ). Typical values for 7? range from 5 to 100. 
The paragraph beginning on page 13 at line 9: 

FIGS. 4 A and 4B illustrate[s] selected components of the adaptive LSA paradigm 
400 using latent semantic adaptation in accordance with an embodiment of the present 
invention. The adaptive LSA paradigm 400 extends the basic LSA paradigm 300 so that 
some or all of the data in new documents 1 1 0 are taken into account through incremental 
adaptation of the original LSA space S 3 16 in a way that is computationally efficient. 
Adaptation of the original LSA space S 3 16 insures that the semantic classification error 
rate of the semantic classification unit 1 12 does not substantially increase as the new 
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words and documents 1 10 vary from those contained in the original training corpus 7 
202. 

The paragraph beginning on page 13 at line 25: 

With reference to FIG. 4A, if n additional documents contain words drawn from 
the original underlying vocabulary V 206 plus m words previously unseen (i.e. out-of- 
vocabulary words), then the adaptive LSA paradigm 400 constructs a word-document 

matrix W of dimension (M + /n)x (// + n) in the same manner as described for 
generating matrix W202 in the basic LSA paradigm 300 in FIGS. 3 A and 3B. Using the 
same order of decomposition R , the SVD of W AQ2 leads to: 

W^USV\ (4) 
where U 406 is the left singular matrix of dimension {M -\-m)xR, S 408 is the diagonal 
matrix of dimension RxR, and V 410 is the right singular matrix of dimension 
[N n)x R ^ each having the same definitions and properties as described above for 
W,U,S, and Fin FIGS. 3 A and 3B . 

The paragraph beginning on page 14 at line 9: 

As shown in FIG. 4A, the m new words are gathered in the mx[N + n) matrix 
C = [CE] 422, the n new documents are gathered in the (A/ + /w)x « matrix 
D = [D^^^f 424. U 406 is expressed as [f/i^iV[f , where 436 is the transposition 
of the left singular matrix of dimension MxR and U2 438 is the transposition of the left 
singular matrix of dimension mxR . 410 is expressed as [f,^P^2^] where F,^ 439 is 
the transposition of the right singular matrix of dimension RxN and 440 is the 
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transposition of the right singular matrix of dimension Rxn . The new decomposition of 
W expressed in (4) leads to a different LSA space S 416, in which the word and 
document vectors are now given by the scaled row vectors w , = u.S 418 and v j - VjS 

420 (i.e. the rows of US 412 and VS 414) to characterize the position of word and 
document dj . 

The paragraph beginning on page 14 at line 20: 

FIGS. 7 A and 7B illustrate[s] the prior art approach referred to as baseline 
adaptation 700 , where the distinction between the SVD in (1) of the original word- 
document co-occurrence matrix 302 in FIG. 3 A and the SVD in (4) of the extended 

word-document co-occurrence matrix W 402 in FIG. 4A is ignored by making the 
(obviously invalid) assumption that the original LSA space S 316 in FIG. 3B is the same 

as the new LSA space S 416 in FIG. 4B. In other words, in baseline adaptation 700, the 
SVD in (1) is still assumed to be valid even after the new documents become available, 
and the problem is reduced to representing the new data in the original LSA space S 3 16. 
The paragraph beginning on page 1 5 at line 1 : 

Referring now to FIGS. 4A and 7B, the baseline adaptation approach 700 treats 

the portions of the matrix W 402 identified as C 430 and D 432 as merely extensions of 
additional rows or columns of the original matrix W 302, and discards altogether the 

portion of the extended matrix W 402 identified as E434, This has the effect of 
ignoring significant amounts of new data, including any out-of-vocabulary words in the 
new documents. 
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The paragraph beginning on page 15 at line 7: 

Using the baseline adaptation approach 700 , the representation of those portions 
of the new data that will be added to the original LSA space S 3 16 is obtained from the 
SVD of as C 430 and D 432 as follows: 

C = YSV\ (5) 

D = USZ'\ (6) 
where the mxR matrix Y 426 and the nxR matrix Z 428 are defined a posteriori (as 
plug-ins), to satisfy the relationship. In essence, using the baseline adaptation framework 
700, the role of matrices Y 426 and Z428 is to "extend" the original matrices U 306 
and V 3 10 to accommodate the new data. The original word and document vectors w, 
3 1 8 and vj 320 are still given by the rows of US 3 12 and VS314, but the new word and 
document vectors y- 446 and Zy 448 are given by the rows of YS 442 and ZS 444, 

respectively. From (5) and (6), these are seen to be: 

YS = CV, (7) 
ZS^D^U, (8) 

The effect, illustrated in FIG. 7B, is that the original LSA space 5 316 becomes 
populated with the new data, i.e. the new word and document vectors y .^ 446 and z y 448, 
hence the name "folding-in." 

The paragraph beginning on page 15 at line 24: 

A major drawback to the above-described baseline adaptation approach 700 
illustrated in FIG. 7B is poor performance, since even when populated with the new word 

and document vectors y^ 446 and Z7 448, the misclassification error rate using the 
original LSA space 5" 316 is still high when the new words and documents vary from the 
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original training corpus 7202, e.g. when the new documents contain several new words 
not in the original training corpus. 

The paragraph beginning on page 16 at line 3: 

In contrast, the latent semantic adaptation approach of the present invention 
achieves significant reductions in the misclassification error rate. Unlike baseline 
adaptation 700, the latent semantic adaptation approach of the present invention 
recognizes that there is an important distinction between the SVD in (1) of the original 
word-document co-occurrence matrix ^ 302 in FIG. 3 A and the SVD in (4) of the 

extended word-document co-occurrence matrix W 402 in FIG. 4A that must be taken 
into account since the original LSA space S 3 16 in FIG. 3B is not the same as the new 
LSA space S 416 in FIG. 4B. In other words, the SVD in (1) is no longer valid after the 
new documents become available, so the problem is more than just representing the new 
data in the original LSA space 5 316. Therefore, in one embodiment, the latent semantic 

adaptation approach treats the portions of the matrix W 402 identified as C 430 and/or 
D 432 in FIG. 4A as new data that must be accounted for in a new LSA space 5^ 416. In 
one embodiment, the portion of the matrix W 402 identified as E 434 in FIG. 4A is also 
treated as new data that must be accounted for in a new LSA space S 416. 
The paragraph beginning on page 16 at line 17: 

In one embodiment of latent semantic adaptation, the scaled row vectors (i.e. the 
rows of [/5 412 and VS 414) are obtained directly fi-om the SVD of the entire matrix 
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W 402 in (4) using a latent semantic adaptation framework 400 as defined in the 
equations that follow. By inspection from FIG. 4A, 

C = U,SV,\ (9) 

D = U,SV,\ (10) 

and 

W = U,SV,\ (11) 

E^U,SV,\ (12) 

each of which are column-orthonormal, i.e., U^U = V^V = (the identity matrix of 
order i? ). The orthogonality constraints can also be expressed in terms of 17, , f/j ? ? 
and V2 as follows: 

U'^U = = U[U, -\-UlU2 , (13) 
V'V^I,^V,%^V^V,, (14) 

In one embodiment, the foregoing equations (9)-(14) define the latent semantic adaptation 

framework 400 of the method of the present invention. The latent semantic adaptation 

framework 400 is used to solve for the "extension" SVD matrices U 406, S 408, and 

F 410 as a function of the original SVD matrices U 306,5 308, V 3 10, and "extension" 
SVD matrices 7426, and Z428. 

The paragraph beginning on page 17 at line 1 1 : 

According to one embodiment, the solution is obtained by setting up a latent 
semantic adaptation transformation 500, as illustrated in FIGS. 5 A and 5B , based on the 
assumptions previously noted that the dimension R of the original LS A space S 3 1 6 is 
low enough that none of the corresponding R singular values are zero, and that the 
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transformation necessary to adapt the original LSA space S 316 is invertible. Starting 

with S 408, the shift from S 308 in FIG. 3 A to S 408 in FIG. 4A can be captured as 
illustrated in FIGS. 5 A and 5B by the following expressions: 

U,=UG, (15) 
V,=VH, (16) 

where G 508 and H 5 18 are (^ x /?) matrices that, according to the second assumption, 
are assumed to be invertible. Taken together, (15) and (16) define a latent semantic 
adaptation matrix transformation 500 to apply to the original SVD matrices U 306 and 
F 3 1 0 to update them according to the new data. 

The paragraph beginning on page 20 at line 7: 

From equations (17), (28), and (29), it is clear that: 

{GSXGSy = GS'G^ = SH-^H-'S = S(I^ + Z'^Z)S , (37) 

(HS)(HSy = HS^H"^ = SG-'^G-'S = S(I^ + 7^7)5 . (38) 
Thus, it is also possible to obtain GS and HS directly through Choleski decomposition, 
in a manner analogous to that mentioned above G 508 and // 5 1 8. In fact, as illustrated 
in FIGS. 6 A and 6B . if J [608] 618 and [618] 608 are the solutions of relevant 
Choleski decompositions, viz.: 

jr=iI,+Y'Y), (39) 

KK'' ={I^+Z^Z), (40) 
then equations (35)-(38) admit as solutions: 



US = 



US 
YS 



K, (41) 



~~ VS 

VS= J. (42) 

zs 
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The paragraph beginning on page 20 at Hne 20: 

In other words, in accordance with one embodiment of the present invention, the 
original vectors US 3\2 and VS 314, as well as the new vectors resulting from the 
"folding-in" process YS 442 and ZS 444, can be transformed using a latent semantic 
adaptation vector transformation 600 defined by the transformation matrices K [618] 608 
in FIG. 6A and J [608] 618 in FIG 6B to respectively yield the updated word vectors 

US 412 and document vectors VS 414. Therefore, equations (41) and (42) make it 
possible to adapt the original LSA space S 3 16 of FIG. 3B to the new LSA space S 416 
of FIG. 4B. 

The paragraph beginning on page 21 at line 3: 

In one embodiment of the latent semantic adaptation framework 400, the new 
information, as reflected through the transformation matrices [618] 608 and J [608] 

618, affects both original word and document vectors w, 318 and vy 320 and new word 

and document vectors y- 446 and 2^448, referred to as two-sided adaptation. Stated 

another way, the transformed representation of the new word and document vectors y- 

446 and 2^448 takes into account its own influence on the underlying semantic 
knowledge that was encapsulated in the original LSA space S 3 16 of FIG. 3B (i.e. the 

existing word and document vectors w, 318 and vj 320) to yield the transformed word 
and document vectors w , 418 and v j 420 that populate the new LSA space S 416 of 
FIG. 4B, As indicated by the arrows in the new LSA space S 416 of FIG. 4B, the 
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positions of both the words and documents represented by original word and document 
vectors w, 318 and vy 320 have shifted from their positions in the original LSA space S 
3 16 to reflect their changed position (i.e. their relationship) within the new LSA space 
S 416. The new LSA space S 416 allows not only for improvements in the 
misclassification error rate, but also provides the ability to adapt the speech recognition 
database that embodies the new LSA space 416 in real-time, because the application of 
the transformation matrices [618] 608 and J [608] 618 is computationally efficient and 
bypasses the need to re-compute the LSA space. 

The paragraph beginning on page 22 at line 3: 

In addition to providing improved performance through lowering the 
misclassification rate, it is also worth noting that the latent semantic adaptation 
framework 400 and resulting latent semantic adaptation matrix and vector 
transformations 500 and 600 respectively are computationally efficient. Compared to the 
"folding-in" computations of the baseline adaptation approach 700, the latent semantic 
adaptation matrix and vector transformations 500 and 600 of the latent semantic 
adaptation framework 400 entail less overhead. For example, in terms of the number of 
floating point operations required, the overhead associated with the latent semantic 
adaptation vector transformations 600 embodied in equations (39)-(42) can be expressed 
as: 
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For typical values of the various dimensions involved, expression (43) will be dominated 
by {M -\- N)R'^ . Depending on the application, this quantity may fall anywhere between 
about 50 million (for voice command and control types of speech recognition 
applications using a limited vocabulary) and more than 1 billion (for large vocabulary 
transcription). Still, on current high-end machines, this quantity only represents up to a 
few seconds of central processor unit (CPU) time. Compared to recomputing the SVD 
from scratch, which requires 0{MNR) operations, the computational complexity is 
reduced by a factor of approximately min(M, N)l R. In many speech recognition 
applications, the reduction factor will be on the order of 1000. In such cases, the latent 
semantic adaptation framework 400 and resulting latent semantic adaptation matrix and 
vector transformations 500 and 600 make it practical to adapt the new LSA space S 416 
with real-time word and document updates, whereas SVD re-computation would 
generally not be feasible. 
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